Multiobjective Clustering with Automatic k-determination for Large-scale Data Scalable automatic determination of the number of clusters
نویسندگان
چکیده
Web mining data mining for web data is a key factor of web technologies. Especially, web behavior mining has attracted a great deal of attention recently. Behavior mining involves analyzing the behavior of users, finding patterns of user behavior, and predicting their subsequent behaviors or interests. Web behavior mining is used in web advertising systems or content recommendation systems. To analyze huge amounts of data, such as web data, data-clustering techniques are usually used. Data clustering is a technique involving the separation of data into groups according to similarity, and is usually used in the first step of data mining. In the present study, we developed a scalable data-clustering algorithm for web mining based on existent evolutionary multiobjective clustering algorithm. To derive clusters, we applied multiobjective clustering with automatic k-determination (MOCK). It has been reported that MOCK shows better performance than k-means, agglutination methods, and other evolutionary clustering algorithms. MOCK can also find the appropriate number of clusters using the information of the trade-off curve. The k-determination scheme of MOCK is powerful and strict. However the computational costs are too high when applied to clustering huge data. In this paper, we propose a scalable automatic k-determination scheme. The proposed scheme reduces Pareto-size and the appropriate number of clusters can usually be determined.
منابع مشابه
Improvement of density-based clustering algorithm using modifying the density definitions and input parameter
Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...
متن کاملخوشهبندی خودکار دادهها با بهرهگیری از الگوریتم رقابت استعماری بهبودیافته
Imperialist Competitive Algorithm (ICA) is considered as a prime meta-heuristic algorithm to find the general optimal solution in optimization problems. This paper presents a use of ICA for automatic clustering of huge unlabeled data sets. By using proper structure for each of the chromosomes and the ICA, at run time, the suggested method (ACICA) finds the optimum number of clusters while optim...
متن کاملخوشهبندی دادهها بر پایه شناسایی کلید
Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...
متن کاملFuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition
In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...
متن کاملData Clustring Using A New CGA(Chaotic-Generic Algorithm) Approach
Clustering is the process of dividing a set of input data into a number of subgroups. The members of each subgroup are similar to each other but different from members of other subgroups. The genetic algorithm has enjoyed many applications in clustering data. One of these applications is the clustering of images. The problem with the earlier methods used in clustering images was in selecting in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007